More on Ivy Bridge's New Graphics
Now that we’ve seen how work flows into an Ivy Bridge Processor Graphic unit, we can focus on what actually happens to it.
The
Slice Common can be viewed as a hub for a stream processor cluster, as it has some centralised resources, such as a rasteriser unit, and a new Level 3 cache. This is a separate, dedicated unit of the Level 3 cache that the execution cores and the Processor Graphics share. ‘
In Sandy bridge we were going to put an L3 in, but we didn't do it because we couldn’t find any real performance reason to do it,’ explained Piazza, ‘
The most significant thing about the L3 cache is suppressing cycles out to the Ring [bus memory controller] and also raising your bar on performance per watt. Media applications also get a benefit from it – there are a lot of things in media that just hit the L3 cache.’
The Slice Common and one attendant Slice
Interestingly, Piazza also told us that ‘
can put down many sub-slices around a Slice Common, and we can put down lots of Slice Commons ultimately over time, with slices for further scale-ups.’ The Slice Common also incorporates at least the Z-complex, stencil complex, stencil back-end units.
Slices are the stream processor clusters that are arrayed around the Slice Commons. The rate of progress that Intel’s making was clear when Piazza told us that, ‘
you cannot count our shader number and ask what does that mean,’ by which he means that the features and capabilities of the stream processors are evolving significantly from one design to the next.
This always happens when updating stream processor design – it’s the reason why an AMD Radeon HD 6970 2GB has 1,536 stream processors, while a previous-generation Radeon HD 5870 1GB has 1,600 stream processors – the
A less conceptual view of Ivy Bridge's
Processor Graphics unit
units in the HD 6970 2GB are more capable, even if they’re also less numerous.
The major update is the co-issue of FMAs (as described on
page 4). It’s mainly because this performance optimisation relies on many threads to attempt to co-issue that resulted in Intel increasing the thread count in the thread manager. That, and just stuffing more work into the GPU per clock.
As well as the stream processors and attendant texture units, ‘
there’s something called a media sampler,’ said Piazza, ‘
so a lot of the post processing you do with media will talk to that – there are a lot of interesting filters that we use beyond 3D for that sampler.’
Piazza also revealed that Intel was comfortable with a 4-wide vector-based stream processor that’s quad-pumped. ‘
If you take a look at tessellated workloads,' he says,
'it’s very, very convenient to have narrow engines rather than wider engines – wider engines sound more efficient, but they’re actually not.’
Piazza revealed that if the stream processor was dealing with a triangle with fewer than 16 pixels, it could step down to dual-pumping for extra speed.
We expect there to be two flavours of Ivy Bridge Processor Graphics, as Piazza talked about GT1 and GT2 graphics units on both Sandy Bridge and Ivy Bridge. He said that the basic GT1 unit of Ivy Bridge will have the same amount of samplers as the GT2 unit of Sandy Bridge, and that ‘
when we do a GT2, you get twice the samplers’. We think this means that the basic Ivy Bridge graphics unit will have 12 stream processors (though with twice the FMA performance and all the other performance enhancements), while the GT2 model will have 24.
We should also mention that as the entire Ivy Bridge die is made of
22nm 3D transistors, the same predictions on performance and power apply. That is, you either get a huge performance boost for the same power draw, a huge power saving for the same level of performance, or something in between. We expect Intel to ramp up the frequency of the Ivy Bridge Processor Graphics compared to the Sandy Bridge unit, thus adding more performance.
Want to comment? Please log in.